Linear function approximator as a neural network.
What must we add to do logistic regression?
Just some post-processing. $\Wv = \betav$ and $K-1$ instead of $K$.
Any thoughts on how to do nonlinear logistic regression?
Here we repeat the derivation using $\Wv$ instead of $\beta$.
Let's first develop this for $ k < K$.
$$ \begin{align*} \frac{\partial g_{n,k}}{\partial \Wv_{d,j}} &= \frac{\partial}{\partial \Wv_{d,j}} \eby{k} \left (1 + \sum_{m=1}^{K-1} \eby{m} \right )^{-1} \\ & = \eby{k} \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} \left (1+ \sum_{m=1}^{K-1} \eby{m} \right )^{-1} + \eby{k} (-1) \left (1+ \sum_{m=1}^{K-1} \eby{m} \right )^{-2} \sum_{m=1}^{K-1} \eby{m} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}}\\ & = \frac{\eby{k}}{1+\sum_{m=1}^{K-1} \eby{m}} \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} - \frac{\eby{k}}{1+\sum_{m=1}^{K-1} \eby{m}} \sum_{m=1}^{K-1} \frac{\eby{m}}{1+\sum_{m=1}^{K-1} \eby{m}} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}}\\ & = g_{n,k} \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} - g_{n,k} \sum_{m=1}^{K-1} g_{n,m} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}}\\ & = g_{n,k} \left ( \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} - \sum_{m=1}^{K-1} g_{n,m} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}} \right ) \end{align*} $$And again for $k = K$.
$$ \begin{align*} \frac{\partial g_{n,k}}{\partial \Wv_{d,j}} &= \frac{\partial}{\partial \Wv_{d,j}} \left (1 + \sum_{m=1}^{K-1} \eby{m} \right )^{-1} \\ & = (-1) \left (1+ \sum_{m=1}^{K-1} \eby{m} \right )^{-2} \sum_{m=1}^{K-1} \eby{m} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}}\\ & = - \frac{1}{1+\sum_{m=1}^{K-1} \eby{m}} \sum_{m=1}^{K-1} \frac{\eby{m}}{1+\sum_{m=1}^{K-1} \eby{m}} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}}\\ & = - g_{n,k} \sum_{m=1}^{K-1} g_{n,m} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}}\\ \end{align*} $$The last expression can be made to look like the last one for $k<K$ by including $\frac{\partial y_{n,K}}{\partial \Wv_{d,j}} = 0$:
$$ \begin{align*} \frac{\partial g_{n,k}}{\partial \Wv_{d,j}} &= g_{n,k} \left ( \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} - \sum_{m=1}^{K-1} g_{n,m} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}} \right ) \end{align*} $$Substituting for $\frac{\partial g_{n,k}}{\partial \Wv_{d,j}}$ in $\frac{\partial LL(\Wv)}{\partial \Wv_{d,j}}$ for all $k$:
$$ \begin{align*} \frac{\partial g_{n,k}}{\partial \Wv_{d,j}} &= g_{n,k} \left ( \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} - \sum_{m=1}^{K-1} g_{n,m} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}} \right )\\ \frac{\partial LL(\Wv)}{\partial \Wv_{d,j}} & = \sum_{n=1}^N \sum_{k=1}^K \frac{t_{n,k}}{g_{n,k}} \frac{\partial g_{n,k}}{\partial \Wv_{d,j}}\\ & = \sum_{n=1}^N \sum_{k=1}^K \frac{t_{n,k}}{g_{n,k}} g_{n,k} \left ( \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} - \sum_{m=1}^{K-1} g_{n,m} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}} \right )\\ & = \sum_{n=1}^N \sum_{k=1}^K t_{n,k} \left ( \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} - \sum_{m=1}^{K-1} g_{n,m} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}}\right )\\ & = \sum_{n=1}^N \sum_{k=1}^K t_{n,k} \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} - \sum_{k=1}^K t_{n,k} \sum_{m=1}^{K-1} g_{n,m} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}}\\ & = \sum_{n=1}^N \sum_{k=1}^K t_{n,k} \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} - \sum_{m=1}^{K-1} g_{n,m} \frac{\partial y_{n,m}}{\partial \Wv_{d,j}}\\ & = \sum_{n=1}^N \sum_{k=1}^K (t_{n,k} - g_{n,k}) \frac{\partial y_{n,k}}{\partial \Wv_{d,j}}\\ & = \sum_{n=1}^N \sum_{k=1}^{K-1} (t_{n,k} - g_{n,k}) \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} \;\;\;\text{ because } \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} = 0 \text{ for } k=K \end{align*} $$General gradient
$$ \begin{align*} \frac{\partial LL(\Wv)}{\partial \Wv_{d,j}} & = \sum_{n=1}^N \sum_{k=1}^{K-1} (t_{n,k} - g_{n,k}) \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} \end{align*} $$For linear logistic regression, $y_{n,j} = \xv_n^T \Wv_{*,j}$, so $\frac{\partial y_{n,k}}{\partial \Wv_{d,j}}$ exists only when $j=k$, so
$$ \begin{align*} \frac{\partial LL(\Wv)}{\partial \Wv_{d,j}} & = \sum_{n=1}^N (t_{n,j} - g_{n,j}) \frac{\partial y_{n,j}}{\partial \Wv_{d,j}}\\ & = \sum_{n=1}^N \left ( t_{n,j} - g_{n,j} \right ) \xv_{d,j} \end{align*} $$First the general form again.
$$ \begin{align*} \frac{\partial LL(\Wv)}{\partial \Wv_{d,j}} & = \sum_{n=1}^N \sum_{k=1}^{K-1} (t_{n,k} - g_{n,k}) \frac{\partial y_{n,k}}{\partial \Wv_{d,j}} \end{align*} $$Now $y_{n,j}$ depends on $\Vv$ and $\Wv$, so
$$ \begin{align*} \frac{\partial LL(\Vv,\Wv)}{\partial \Vv_{d,m}} & = \sum_{n=1}^N \sum_{k=1}^{K-1} \left ( t_{n,k} - g_{n,k} \right ) \frac{\partial y_{n,k}}{\partial \Vv_{d,m}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Wv_{m,j}} & = \sum_{n=1}^N \sum_{k=1}^{K-1} \left ( t_{n,k} - g_{n,k} \right ) \frac{\partial y_{n,k}}{\partial \Wv_{m,j}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Wv_{m,j}} & = \sum_{n=1}^N \left ( t_{n,j} - g_{n,j} \right ) \frac{\partial y_{n,j}}{\partial \Wv_{m,j}} \end{align*} $$But, thank goodness, we have already calculated $\frac{\partial y_{n,k}}{\partial \Vv_{d,m}}$ and $\frac{\partial y_{n,k}}{\partial \Wv_{m,k}}$ in our neural network days. This becomes more clear when we compare above with the derivatives of mean squared error with respect to weights for neural networks for regression problems.
$$ \begin{align*} E &= \frac{1}{NK} \frac{1}{2} \sum_{n=1}^N \sum_{k=1}^K (t_{n,k} - y_{n,k})^2\\ \frac{\partial E}{\partial \Vv_{d,m}} & = - \frac{1}{NK} \sum_{n=1}^N \sum_{k=1}^K (t_{n,k} - y_{n,k}) \frac{\partial y_{n,k}}{\partial \Vv_{d,m}}\\ \frac{\partial E}{\partial \Wv_{m,j}} & = - \frac{1}{NK} \sum_{n=1}^N \sum_{k=1}^K (t_{n,k} - y_{n,k}) \frac{\partial y_{n,k}}{\partial \Wv_{m,j}}\\ \frac{\partial E}{\partial \Wv_{m,j}} & = - \frac{1}{NK} \sum_{n=1}^N (t_{n,j} - y_{n,j}) \frac{\partial y_{n,j}}{\partial \Wv_{m,j}} \end{align*} $$Compare to gradients for likelihood
$$ \begin{align*} LL(\Vv,\Wv) & = \sum_{n=1}^N \sum_{k=1}^K t_{n,k} \log g_{n,k} \text{ where } g_{n,k} = \left \{ \begin{array}{ll} \frac{\eby{k}}{1+\sum_{m=1}^{K-1} \eby{m}}; & k < K\\ \frac{1}{1+\sum_{m=1}^{K-1} \eby{m}}; & k= K\\ \end{array} \right . \\ \frac{\partial LL(\Vv,\Wv)}{\partial \Vv_{d,m}} & = \sum_{n=1}^N \sum_{k=1}^{K-1} \left ( t_{n,k} - g_{n,k} \right ) \frac{\partial y_{n,k}}{\partial \Vv_{d,m}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Wv_{m,j}} & = \sum_{n=1}^N \sum_{k=1}^{K-1} \left ( t_{n,k} - g_{n,k} \right ) \frac{\partial y_{n,k}}{\partial \Wv_{m,j}}\\ \frac{\partial LL(\Vv,\Wv)}{\partial \Wv_{m,j}} & = \sum_{n=1}^N \left ( t_{n,j} - g_{n,j} \right ) \frac{\partial y_{n,j}}{\partial \Wv_{m,j}} \end{align*} $$So, our previously derived matrix expressions for neural networks can be used if we modify the output calculation. Here are the expressions we used for minimizing mean squared error:
$$ \begin{align*} \Zv &= h(\tilde{\Xv} \Vv)\\ \Yv &= \tilde{\Zv} \Wv\\ E &= \frac{1}{NK} \frac{1}{2} \sum (\Tv - \Yv)^2\\ \grad_\Vv E &= - \frac{1}{NK} \tilde{\Xv}^T \left ( (\Tv - \Yv) \hat{\Wv}^T \cdot (1-\Zv^2) \right )\\ \grad_\Wv E &= - \frac{1}{NK} \tilde{\Zv}^T (\Tv - \Yv) \end{align*} $$Here are the changes needed for nonlinear logistic regression. $\Tiv$ is indicator variables for $\Tv$
$$ \begin{align*} \Zv &= h(\tilde{\Xv} \Vv)\\ \Yv &= \tilde{\Zv} \Wv\\ \Fv &= [e^{\Yv}, \ones{N}] \;\;\;\;\;\;\;\;\;\;\; \text{ append column of ones}\\ \Sv &= \Fv \ones{K-1}\;\;\;\;\;\;\;\;\;\;\;\;\; \text{ sum across columns}\\ \Gv &= \Fv / \left [ \Sv, \Sv,\ldots,\Sv \right ] \;\;\; \Sv \text{ are column vectors }\\ LL &= \sum \Tiv \log \Gv\\ \grad_\Vv LL &= \tilde{\Xv}^T \left ( (\hat{\Tiv} - \hat{\Gv}) \hat{\Wv}^T \cdot (1-\Zv^2) \right )\\ \grad_\Wv LL &= \tilde{\Zv}^T (\hat{\Tiv} - \hat{\Gv}) \end{align*} $$where $\hat{\Tiv}$ and $\hat{\Gv}$ are matrices for the target indicator variables and the neural network outputs, respectively, with the last column of values for the $K^{th}$ class removed.
In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
n = 500
x1 = np.linspace(5,20,n) + np.random.uniform(-2,2,n)
y1 = ((20-12.5)**2-(x1-12.5)**2) / (20-12.5)**2 * 10 + 14 + np.random.uniform(-2,2,n)
x2 = np.linspace(10,25,n) + np.random.uniform(-2,2,n)
y2 = ((x2-17.5)**2) / (25-17.5)**2 * 10 + 5.5 + np.random.uniform(-2,2,n)
angles = np.linspace(0,2*np.pi,n)
x3 = np.cos(angles) * 15 + 15 + np.random.uniform(-2,2,n)
y3 = np.sin(angles) * 15 + 15 + np.random.uniform(-2,2,n)
X = np.vstack((np.hstack((x1,x2,x3)), np.hstack((y1,y2,y3)))).T
T = np.repeat(range(1,4),n).reshape((-1,1))
colors = ['blue','red','green']
plt.figure(figsize=(6,6))
for c in range(1,4):
mask = (T == c).flatten()
plt.plot(X[mask,0],X[mask,1],'o',markersize=6, alpha=0.5, color=colors[c-1])
Let's try to classify this data with a 5 hidden unit neural network with nonlinear logistic regression. In Python, do this by defining a new class NeuralNetClassifier it is easy to create a new class for using a neural network as a classifier by making a subclass NeuralNetworkClassifier of the NeuralNetwork and make the required changes. The changes will be in objectiveF and gradF functions local to the train method, and in the use method.
In [8]:
!wget http://www.cs.colostate.edu/~anderson/cs480/notebooks/nn2.tar
!mv nn2.tar.1 nn2.tar # in case I already have a file named nn2.tar in this directory
!tar xvf nn2.tar
In [9]:
cat neuralnetworks.py
In [11]:
import neuralnetworks as nn
import mpl_toolkits.mplot3d as plt3
from matplotlib import cm
## if you edit neuralnetwork.py, force ipython to reload it by doing this.
# from imp import reload
# reload(nn)
nHidden = 5
nnet = nn.NeuralNetworkClassifier(2,nHidden,3) # 3 classes, will actually make 2-unit output layer
nnet.train(X,T, nIterations=1000, verbose=True)
xs = np.linspace(0,30,40)
x,y = np.meshgrid(xs,xs)
Xtest = np.vstack((x.flat,y.flat)).T
Ytest = nnet.use(Xtest)
predTest,probs,_ = nnet.use(Xtest,allOutputs=True) #discard hidden unit outputs
plt.figure(figsize=(10,10))
plt.subplot(2,2,1)
plt.plot(np.exp(-nnet.getErrorTrace()))
plt.xlabel("Epochs")
plt.ylabel("Likelihood")
plt.subplot(2,2,3)
nnet.draw()
colors = ['red','green','blue']
plt.subplot(2,2,2)
for c in range(1,4):
mask = (T == c).flatten()
plt.plot(X[mask,0],X[mask,1],'o',markersize=6, alpha=0.5, color=colors[c-1])
plt.subplot(2,2,4)
plt.contourf(Xtest[:,0].reshape((40,40)),Xtest[:,1].reshape((40,40)), predTest.reshape((40,40)),
levels=(0.5,1.5,2.5,3.5),
colors=('red','green','blue'));
In [17]:
fig = plt.figure(figsize=(20,6))
for c in range(3):
ax = fig.add_subplot(1,3,c+1,projection='3d')
ax.plot_surface(x,y,probs[:,c].reshape(x.shape),
rstride=1,cstride=1,linewidth=0.2,antialiased=False,
color=colors[c],alpha=0.7)
ax.view_init(azim = 180+40,elev = 40)
ax.set_zlabel(r"$p(C="+str(c+1)+"|x)$")
How would you plot the outputs of the hidden units?
Let's repeat the experiment with classifying human activity data (accelerometer data), but now use our NeuralNetworkClassifier class to do nonlinear logistic regression. This time we will retrieve and load accelerometers.npy, a file containing a numpy array stored in its binary format.
In [18]:
!wget http://www.cs.colostate.edu/~anderson/cs480/notebooks/accelerometers.npy
!mv accelerometers.npy.1 accelerometers.npy
data = np.load('accelerometers.npy')
In [19]:
data.shape
Out[19]:
In [20]:
data[0,:]
Out[20]:
In [21]:
X = data[:,1:]
T = data[:,0:1]
X.shape, T.shape
Out[21]:
In [22]:
import mlutils as ml # for ml.paritition
In [23]:
Xtrain,Ttrain,Xtest,Ttest = ml.partition(X,T,(0.8,0.2),classification=True) #stratified partitioning (by class)
In [24]:
Xtrain.shape,Ttrain.shape,Xtest.shape,Ttest.shape
Out[24]:
In [25]:
np.unique(Ttrain, return_counts=True)
Out[25]:
In [26]:
%precision 5
values,counts = np.unique(Ttrain, return_counts=True)
counts / Ttrain.shape[0]
Out[26]:
In [27]:
values,counts = np.unique(Ttest, return_counts=True)
counts / Ttest.shape[0]
Out[27]:
In [28]:
nnet = nn.NeuralNetworkClassifier(3,10,10) # 10 classes
nnet.train(Xtrain,Ttrain,nIterations=50,errorPrecision=1.e-8, verbose=True)
plt.rcParams['figure.figsize'] = (6,6)
plt.plot(np.exp(-nnet.getErrorTrace()))
plt.xlabel('Iteration')
plt.ylabel('Data Likelihood');
In [30]:
Ptrain,Prtrain,_ = nnet.use(Xtrain,allOutputs=True)
Ptest,Prtest,_ = nnet.use(Xtest,allOutputs=True)
plt.subplot(2,1,1)
plt.plot(np.hstack((Ttrain,Ptrain)), '.')
plt.legend(('Actual','Predicted'))
plt.subplot(2,1,2)
plt.plot(np.hstack((Ttest,Ptest)), '.')
plt.legend(('Actual','Predicted'));
In [31]:
cm = ml.confusionMatrix(Ttest,Ptest,np.unique(Ttest))
cm
Out[31]:
In [32]:
ml.printConfusionMatrix(cm,np.unique(Ttest))
In [34]:
nnet = nn.NeuralNetworkClassifier(3,20,10) # 10 classes
nnet.train(Xtrain,Ttrain,nIterations=50,errorPrecision=1.e-8, verbose=True)
print('Trained for',nnet.getNumberOfIterations(),'iterations')
Try training for more iterations.
In [36]:
classes = np.unique(Ttest)
Ptrain,Prtrain,_ = nnet.use(Xtrain,allOutputs=True)
Ptest,Prtest,_ = nnet.use(Xtest,allOutputs=True)
print('Percent Correct: Training',100*np.sum(Ptrain==Ttrain)/len(Ttrain), 'Testing',100*np.sum(Ptest==Ttest)/len(Ttest))
print()
ml.printConfusionMatrix( ml.confusionMatrix(Ttest,Ptest,classes), classes )
print('1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs')
plt.plot(np.exp(-nnet.getErrorTrace()));
In [37]:
nnet.draw()
In [38]:
import scipy.signal as sig
def cwt(eeg,Fs,freqs,width,channelNames=None,graphics=False):
if freqs.min() == 0:
print('cwt: Frequencies must be greater than 0.')
return None,None
nChannels,nSamples = eeg.shape
if not channelNames and graphics:
channelNames = ['Channel {:2d}'.format(i) for i in range(nChannels)]
nFreqs = len(freqs)
tfrep = np.zeros((nChannels, nFreqs, nSamples))
tfrepPhase = np.zeros((nChannels, nFreqs, nSamples))
for ch in range(nChannels):
print('channel',ch,' freq ',end='')
for freqi in range(nFreqs):
print(freqs[freqi],' ',end='')
mag,phase = energyvec(freqs[freqi],eeg[ch,:],Fs,width)
tfrepPhase[ch,freqi,:] = phase
tfrep[ch,freqi,:] = mag
print()
return tfrep, tfrepPhase
def morletLength(Fs,f,width):
''' len = morletLength(Fs,f,width) '''
dt = 1.0/Fs
sf = f/width
st = 1.0/(2*np.pi*sf)
return int((3.5*st - -3.5*st)/dt)
def energyvec(f,s,Fs,width):
'''
function [y,phase] <- energyvec(f,s,Fs,width)
function y <- energyvec(f,s,Fs,width)
Return a vector containing the energy as a
function of time for frequency f. The energy
is calculated using Morlet''s wavelets.
s : signal
Fs: sampling frequency
width : width of Morlet wavelet (><- 5 suggested).
'''
dt = 1.0/Fs
sf = f/float(width)
st = 1.0/(2*np.pi*sf)
t = np.arange(-3.5*st,3.5*st,step=dt)
m = morlet(f,t,width)
# yconv = np.convolve(s,m,mode="same")
yconv = sig.fftconvolve(s,m,mode='same')
lengthMorlet = len(m)
firsthalf = int(lengthMorlet/2.0 + 0.5)
secondhalf = lengthMorlet - firsthalf
padtotal = len(s) - len(yconv)
padfront = int(padtotal/2.0 + 0.5)
padback = padtotal - padfront
yconvNoBoundary = yconv
y = np.abs(yconvNoBoundary)**2
phase = np.angle(yconvNoBoundary,deg=True)
return y,phase
######################################################################
def morlet(f,t,width):
'''
function y <- morlet(f,t,width)
Morlet''s wavelet for frequency f and time t.
The wavelet will be normalized so the total energy is 1.
width defines the width of the wavelet.
A value ><- 5 is suggested.
Ref: Tallon-Baudry et al., J. Neurosci. 15, 722-734 (1997), page 724
Ole Jensen, August 1998
'''
sf = f/float(width)
st = 1.0/(2*np.pi*sf)
A = 1.0/np.sqrt(st*np.sqrt(2*np.pi))
y = A*np.exp(-t**2/(2*st**2)) * np.exp(1j*2*np.pi*f*t)
return y
In [39]:
import time
width = 75 * 1
maxFreq = 20
freqs = np.arange(0.5,maxFreq,0.5) # makes same freqs used in stft above
start = time.time()
tfrep,tfrepPhase = cwt(data[:,1:].T, 75, freqs, width)
print('CWT time: {} seconds'.format(time.time() - start))
In [40]:
plt.figure(figsize=(15,15))
plt.subplot(5,1,1)
plt.plot(data[:,1:])
plt.axis('tight')
plt.subplot(5,1,2)
plt.plot(data[:,0])
plt.text(5000,8,'1-Rest, 2-Coloring, 3-Legos, 4-Wii Tennis, 5-Boxing, 6-0.75, 7-1.25 m/s, 8-1.75, 9-2.25 m/s, 10-stairs')
plt.axis('tight')
nSensors = data.shape[1] - 1
for i in range(nSensors):
plt.subplot(5,1,i+3)
plt.imshow(np.log(tfrep[i,:,:]),
interpolation='nearest',origin='lower',
cmap=plt.cm.jet) #plt.cm.Reds)
plt.xlabel('Seconds')
plt.ylabel('Frequency in ' + ('$x$','$y$','$z$')[i])
tickstep = round(len(freqs) / 5)
plt.yticks(np.arange(len(freqs))[::tickstep],
[str(i) for i in freqs[::tickstep]])
plt.axis('auto')
plt.axis('tight')
In [41]:
tfrep.shape
Out[41]:
In [42]:
X = tfrep.reshape((3*39,-1)).T
X.shape, T.shape, len(np.unique(T))
Out[42]:
In [43]:
Xtrain,Ttrain,Xtest,Ttest = ml.partition(X,T,(0.8,0.2),classification=True) #stratified partitioning (by class)
In [ ]:
nnet = nn.NeuralNetworkClassifier(X.shape[1],20,10) #10 classes
nnet.train(Xtrain,Ttrain,nIterations = 100,errorPrecision=1.e-8, verbose=True)
In [33]:
classes = np.unique(Ttest)
Ptrain,Prtrain,_ = nnet.use(Xtrain,allOutputs=True)
Ptest,Prtest,_ = nnet.use(Xtest,allOutputs=True)
print('Percent Correct: Training',100*np.sum(Ptrain==Ttrain)/len(Ttrain), 'Testing',100*np.sum(Ptest==Ttest)/len(Ttest))
print()
ml.printConfusionMatrix( ml.confusionMatrix(Ttest,Ptest,classes), classes )
print('1-Rest 2-Coloring 3-Legos 4-Wii Tennis 5-Wii Boxing 6-0.75m/s 7-1.25m/s 8-1.75m/s, 9-2.25m/s 10-Stairs')
plt.plot(np.exp(-nnet.getErrorTrace()));
In [34]:
plt.figure(figsize=(15,15))
nnet.draw()
In [35]:
plt.figure(figsize=(8,8))
plt.plot(Ttest,lw=3)
plt.plot(Ptest,'o',alpha=0.6)
plt.legend(('Actual','Predicted'),loc='best');
In [ ]: